The main aim of this research report was to investigate the where and who of Kiva fundees for this would provide insight to both Kiva itself and potential Kiva lenders. We found that most Kiva loans were directed into the Agriculture, Food and Retail sectors and most commonly originated in countries with relatively low GDP per capita. It was also found that the vast majority of Kiva fundees were Women, due to the high levels of inequality found in Kiva’s most common sectors and countries.
Kiva is a service aimed at providing small loans to the world’s unbanked population. Perhaps one or two more sentences about Kiva here
The data set used to generate our research questions comes from the Kiva Crowdfunding “Data Science For Good” open data initiative. This initiative was created so members of the public could help Kiva better understand the levels of poverty in areas where they had active loans (“Data Science for Good: Kiva Crowdfunding” 2018). The data has a CC0: Public Domain licence meaning that we are free to use and distribute the data as we wish (“Creative Commons — CC0 1. 0 Universal” 2023).
As the data has been uploaded by Kiva themselves for the purpose of
finding real insights about their data through an open competition on
Kaggle, the data is original (a primary source), unaggregated or edited,
and thus can be said to be very reliable. The data has been edited by a
community member called mfab which may raise some concerns,
however as Kiva is the ‘Owner’ it is assumed that they have approved
this editor and that the data remains reliable. Somewhat of a limitation
is that the data only ranges between 2014 and 2017, and hence any
possible deviations from long-established trends occurring as a result
of the COVID-19 pandemic cannot be extrapolated. This would have made
for an interesting report.
Some wrangling of the data was required to generate the plots
utilising the Kiva datasets. Most of the wrangling involved grouping and
summarising to generate new dataframes which isolated variables for
comparison. Also required for the geographical analysis undertaken in IQ
3 was to de-normalise the two supplementary datasets provided by Kiva —
kiva_loan_ids and kiva_loan_regions — to find
the 3-digit ISO code that went along with every loan. This process has
been undertaken in the data setup section below.
Potential stakeholders for this report are government organisations and charitable services such as Kiva. As they would want to be informed about where their money is being sent and how it is being used. Kiva itself is also a stakeholder of this report.
library(tidyverse)
library(tmap)
library(countrycode)
library(janitor)
library(plotly)
# Kiva loans dataset
kiva_loans <- read_csv("data/kiva_loans.csv")
# Two supplementary Kiva datasets surrounding loan themes and id's
kiva_loan_ids <- read_csv("data/loan_theme_ids.csv")
kiva_loan_regions <- read_csv("data/loan_themes_by_region.csv")
# An external kaggle dataset with a list of countries and corresponding regions
countries <- read_csv("data/countries.csv")
The Kiva loans datasets have been set up with two primary keys:
id and Loan theme ID. The dataset has been
normalised such that if we merge kiva_loan_ids with
kiva_loan_regions we will be able to extract the 3 digit
ISO code related to each loan, which, in combination with the
tmap package, will allow us to conduct a geographical
analysis of Kiva loans.
# tmap in-built World dataset
data("World")
kiva_loan_ids_subset <- kiva_loan_ids %>%
select(c("id", "Loan Theme ID")) %>%
rename(loan_theme_id = `Loan Theme ID`)
kiva_loan_regions_subset <- kiva_loan_regions %>%
select(c("Loan Theme ID", "country", "ISO")) %>%
rename(iso = ISO,
loan_theme_id = `Loan Theme ID`)
kiva_loan_regions_subset <- kiva_loan_regions_subset[!duplicated(kiva_loan_regions_subset), ] # remove duplicates
# Perform merge
kiva_loans_themeid <- inner_join(kiva_loans, kiva_loan_ids_subset, by = "id")
kiva_loans_PKs <- inner_join(kiva_loans_themeid, kiva_loan_regions_subset, by = c("country", "loan_theme_id"))
Note that in the code above we performed an inner join between all three datasets. Due to the presence of NA’s and missing data, this process will inevitably cause some data loss. In our case, we lost approximately 30000 rows of data (\(~4\%\) of the original dataset). We deem that the upsides to working with a complete dataset (with no missing loan theme id’s or ISO’s) is enough to justify this small loss in data.
# Checking for any null values in the two columns of importance
if (is.null(kiva_loans_PKs$iso) || is.null(kiva_loans_PKs$loan_theme_id)) {
print("Null values present in `iso` or `loan_theme_id` columns.")
} else {
print("No null values present in `iso` or `loan_theme_id` columns.")
}
## [1] "No null values present in `iso` or `loan_theme_id` columns."
To answer this question, an interactive scatter plot was produced using the below code. Hovering above a data point will bring up the country’s name, its GDP per capita and its total loan sum. Both axes are logarithmic with a base of 10 in order to spread out the data so relationships can be drawn.
# Grouping by country and adding the loan amount for the kiva dataset
kiva_loan_sum <- kiva_loans %>%
group_by(country) %>%
summarise(signif(sum(loan_amount)*10^(-6),5))
# Selecting the useful data from World in tmap and renaming the country column
world_gdp <- World %>%
select(c("name", "gdp_cap_est")) %>%
rename(country = name)
# Merging the world_gdp data frame and the kiva_sum data frame
kiva_gdp <- merge(kiva_loan_sum, world_gdp, all.x = F, all.y = F)
# Changing the column names for the interactive part of the plot
colnames(kiva_gdp) <- c("Country", "Loan Sum", "GDP per Capita")
# Generating the plot with the axes in log10
plot_kiva_gdp <- ggplot(kiva_gdp, aes(x = `GDP per Capita`, y = `Loan Sum`, country = `Country`)) +
geom_point(colour = "black") +
scale_x_continuous(trans = 'log10') +
scale_y_continuous(trans = 'log10') +
labs(x = "GDP per capita (USD)", y = "Loan Sum (Million USD)", title = "Total sum of Kiva loans against the GDP per capita for each country")
ggplotly(plot_kiva_gdp)
The GDP data used in the interactive scatter plot above is pulled
from the World dataset found within the tmap
package. The plot has slight clustering towards the top left. This lead
us to initially suspect that a relationship may exist between a lower
GDP per capita and a higher amount of loaned money. When attempting to
fit a regression line to the data, no obvious relationship could be
drawn. What we can discern from the data however, is that it is right
skewed, indicting that countries with lower GDP’s per capita make up
most of total loans. Many people in these countries don’t have access to
financial services or even banking, creating a need for charitable
services like Kiva. It is Kiva’s goal to provide such individuals with
access to loans (“Learn More about
Kiva’s Mission” 2023). Our data therefore
demonstrates the free market in action; those who live in countries with
low GDP’s need Kiva, and Kiva needs them.
kiva_loans %>%
group_by(sector) %>%
summarise(total_funding = signif(sum(funded_amount)*10^(-6),5)) %>%
ggplot(aes(x=fct_reorder(sector, desc(total_funding)), y = total_funding)) + geom_bar(fill = "cornflowerblue", color = "black", stat = "identity", position = "dodge", width = 0.8) + # Represents the data as column chart
labs(x = "Sectors", y = "Total Funding (Million USD)", title = 'Total Kiva funding per sector') +
theme_classic() +
geom_text(aes(label = total_funding), vjust = -0.5, size = 2) +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust=0.4))
Reviewing the above bar plot, we can see that agriculture, food and retail are by far the most funded Kiva sectors. For context, these three sectors received twice the amount of funding than the remaining 12. Interestingly, if we review the ‘Use’ column of the data frame it can also be observed that many loans categorised for retail are in fact loans to purchase food items such as Salt, Rice, Flour etc.
library(gt)
kiva_loans %>%
filter(sector == 'Retail') %>%
select(use) %>%
slice(1:10) %>%
gt() %>%
tab_header(
title = "Use data when the loan is categorised as Retail",) %>%
cols_label(use = "Use")
| Use data when the loan is categorised as Retail |
| Use |
|---|
| to buy stock of rice, sugar and flour . |
| to buy packs of salts, biscuits and beverages. |
| to buy packs of salt, biscuits, and beverages. |
| to buy hair oils to sell. |
| to buy different kinds of knives to sell |
| to buy rice, sugar and flour in bulk. |
| To buy women's shoes to sell |
| to stock his store. |
| to buy additional items like eggs, charcoal, rice, Milo, shampoo, groceries, etc. to sell |
| to purchase body lotions, hair oil, jewelery, chemicals and hair conditioners for resale. |
The conclusions drawn from IQ 1 inform us that Kiva loans are most typically requested in developing countries with low GDP’s, in these countries access to food is not a given and the creation of a constant food supply may be able to lift some out of poverty. In his 2015 report, Robert Townsend states that for the worlds poorest, growth in agriculture is two to four times more effective in raising living standards than growth in the next closest sector (Townsend 2015). This fact can create a win-win scenario for all stakeholders. So long as the loan is used wisely, the loan-takers can create more value than they initially borrowed, and funders can see their investment amount to more than its dollar value. As such, it is not surprising that agriculture loans make up 27% of all Kiva loans.
A side by side bar graph was created to visualise the mean loan amounts per gender in each region. It’s worth mentioning that 4,221 sets of data with missing gender information were excluded from the dataset before creating the graph.
## Filtering out data which does not have either Male or Females in borrower_genders column.
genders_clean <- kiva_loans %>%
filter(!is.na(borrower_genders) & borrower_genders %in% c("male", "female"))
## Finding the mean funded amount for genders dependent on country
summary_aggregated <- genders_clean %>%
group_by(country, borrower_genders) %>%
summarize(mean_funded_amount = mean(funded_amount))
## Making columns and separating data
summary_aggregated <- summary_aggregated %>%
mutate(
males = ifelse(borrower_genders == "male", mean_funded_amount, 0),
females = ifelse(borrower_genders == "female", mean_funded_amount, 0)
)
## Synthesizing data
synthesized_data <- summary_aggregated %>%
group_by(country) %>%
summarize(male = sum(males), female = sum(females))
## Merging data with regions
countries_clean <- countries %>%
clean_names() %>%
subset(select = c("country", "region"))
## Renaming a column
synthesized_data <- rename(synthesized_data, c("Country" = "country"))
countries_clean <- rename(countries_clean, c("Country" = "country"))
## Combining male and female data with countries data
countries_clean <- inner_join(countries_clean, synthesized_data, by = "Country")
## Regions Only
regions_data <- countries_clean %>%
group_by(region) %>%
summarize(male = sum(male), female = sum(female))
## Making it look nice
regions_data <- regions_data %>%
gather(key = "gender", value = "value", male, female)
## Plotting Side by Side Graph
ggplotly(
ggplot(data = regions_data, aes(x = region, y = value, fill = gender)) +
geom_bar(colour = "black", stat = "identity", position = "dodge") +
scale_fill_manual(values = c("male" = "cornflowerblue", "female" = "pink")) +
labs(x = "\nRegions", y = "Mean Loan Amount\n") +
ggtitle("Mean Loan Amount Per Gender in Regions") +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(colour = "black"),
axis.title.y = element_text(colour = "black"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
)
Based on the dataset, it was found that for every 1 male borrower, there were 3.16 female borrowers. This is reflected in the graph, as females had a higher mean loan amount per gender in 5 out of 9 regions. The region of Latin America and the Caribbean stood out as having the largest differential gap in mean loan amounts between males and females among all the other regions.
This can be attributed to the high levels of gender inequality in this region, as concluded in a study conducted by the Inter-American Development Bank (IDB) titled “An Unequal Olympiad: Gender Equity in Latin American and Caribbean Companies” (Basco et al. 2021). The study found that women hold only 15% of management positions, 35% of women in the workforce have access to advanced technologies, and 6 out of 10 companies do not provide any type of maternity leave beyond what is determined by law. Highlighting the need for further research and action to address gender inequality in this region, and the potential impact it has on loan amounts.